# EECS 2021

## **PROJECT MILESTONE 2**

Priyankkumar Patel

**Harpreet Goraya** 

**Kishan Patel** 

**Kairav Naik** 

## **Table of Contents**

| PART A: Overview                              | 2  |
|-----------------------------------------------|----|
| PART B: Module-Wise Explanation               | 2  |
| PART C: Explanation of the System             | 10 |
| PART D: Implementation of RISC-V Instructions | 11 |
| PART E: Added Instruction                     | 14 |
| PART F: Functionality Of Relevant Modules     | 16 |
| PART G: Project Breakdown                     | 24 |
| Part H: Conclusion                            | 25 |

#### **PART A: Overview**

This whole project is based on how an instruction in RISC-V processes into the CPU using six states: **Fetching, Decoding, Control, Executing, Writeback, and Change PC**. The project is outlined in the CPU\_top module and we have control module as the brain of the project. This project gives us an entire idea from how an instruction is fetched to its execution. In this project we have total of eight modules namely: pc, registers, ALU, inst\_ROM, inst\_decoder, control, CPU\_top and data\_RAM. We will see a detail description of each module and how this system works into the following sections of this report .

## **PART B: Module-Wise Explanation**

#### pc module:

This module consists a total of 6 inputs and 2 outputs. Inputs are:

- a. rst (reset).
- b. Inc pc (increment pc)
- c. Jump
- d. Branch
- e. jump\_addr (jumping address)
- f. branch addr (branch address)

Based on these inputs we get two outputs:

- g. pc addr (pc address)
- h. pc\_addr\_plus (increments pc address by 1)

This module determines the address of PC (Program Counter) whether it is a branch or jump instruction or neither of the instructions. If the program is resetting, the PC address will be set to 0 (32 bits). If not then, it will see whether a branch or jump instruction is implemented. If it is branch, the PC address will be updated to branch\_addr with a target address. If its branch instruction, then PC address will be updated to branch\_addr (branching address). But if the current instruction is neither of the two, the PC address will increment by 1 and will be updated to pc\_addr\_plus. Here is an illustration of this module:



#### • ALU module:

This is out logical operational unit of the CPU, where all the logical operations are performed (Arithmetic Logic Unit). In this unit, we have 4 inputs and 2 outputs. Two data for the operation (32 bits), one command (5 bits) to determine which operation we need to perform on those data's and an ALUenable. Based on the type of command i.e., ADD, SUB, SLL, XOR, OR the result will be will be updated into ALU\_result. How does it determines which command it has? It is determined from the control unit of our CPU. In the control unit, when the state is "control" and it is executing the current instruction, it determines the type of execution and from there it triggers the ALU... operation whichever command it is. For example: if during the control state, the execution instruction is of addition, then in the execution state it will perform ALUADD operation into the ALU module and update its result in ALU\_result wire. Each command is of 5 bits, with each of them uniquely stored for different operations. If the result is zero, then ALU zero will store 32'b000000000 in it. Diagrammatically we can show this module as:



## • Inst\_ROM module

This module is read-only so it is ROM. If the load instruction control signal is 1 into the control unit, then it will get the instruction indexed by the inst\_addr from the pc module. Else if it is not the load instruction then inst will be updated by the current instruction based on its address. Therefore, this module reads the instruction indexed on the instruction address.



#### Inst decoder

It is important to decode the values to do the operation in the ALU and which type of instruction it is. This module takes the inst (instruction) as the input from the inst ROM module after it reads the instruction based on its address. As we know that any instruction has a 32-bit value when converted to machine language. And we also know how those 32 bits are partitioned in opcode, rs1, rs2, rd, functional code, etc. based on the instruction. So firstly, it takes the first 7 bits of the instruction which is the opcode of the instruction and based on it, it determines whether its I-type, S-type, R-type, UJ-type or SH-type. After it checks the opcode, the decoder checks the 12<sup>th</sup> – 14<sup>th</sup> bit which are the functional code of the operation. After determining the opcode and its functional code it is easier to get the rs1, rs2, rd, or imm values of the particular instruction. Inst decoder has a total of 7 outputs: rr1, rr2, wr, ALU data2, branch address, jump address, execution. Both rr1 and rr2 are rs1 and rs2, wr is the target register or rd in which the answer value is stored from the operations (ADD, SUB, LW, SLLI, SLL, XOR, OR, JAL). Branch address and jump address are triggered only when the opcode is 1100011(branch) or 1101111(jump and link). The netlist viewer has a complicated diagram of this module so in a simpler manner here is a diagram:



#### Control module

This module is also called the brain of the CPU we designed. It has a total of 9 inputs:

- a. execution (10 bits)
- b. clk (clock)
- c. rst (reset)
- d. ALU data2
- e. rd2
- f. ALUzero
- g. pc\_addr\_plus
- h. ALUresult
- i. rd\_data

The control unit handles and processes each state of the CPU. So, it starts with checking whether the program is resetting or not. If yes, then everything becomes zero or else if the state is fetch: then the control unit will load one instruction from the inst ROM module and change its state to decoding. When the state is decoding, the op reg will be updated to 110000000 because it will start to decode the fetched instruction using the inst decoder module and the state will be updated to control. Now during the control state, as we have determined the type of instruction in decoder module, based on that instruction it performs. For example, if the instruction is LW or SW, ALUsrc will be triggered to 1 because LW takes second input from ALU data2. Also, it will update the ALU command to specific command. After determining the inputs from the control state, the execution state executes the instruction using the instruction to be executed and operation to be made using ALUcommand in the ALU module. It is also similar for the Writeback state, based on the executed data and the instruction, the WB state begins to read and write into the register writing. Finally, the change pc state increments the Program Counter for the next instruction changing the state again to fetch. Also, in the change pc state op reg will write the result into register file. This is how our control unit works.



## • CPU\_top module

This explains the entire design of the project we are working on. The RTL netlist viewer gives us an entire idea of how every module of the project is connected to each other. There is no code inside it, instead all seven module instances are created into it. Like the java interface, where it just consists of method headers, similarly these instances are like headers of various modules which calls them and processes them accordingly.



## register module

This module creates a registerfile to store registers. So, it acts as a combination of 2-mux's: which means based on the input of rr1 and rr2, the registers are stored into the destination registers rd1 and rd2.





## • data\_RAM module

This module will trigger when we have to function regwrite or when we have to write back the result of the operation to the target register. In this module, the address of the value and the writing data will be fed to RAM module including the control signals mem\_wr (memory write) and mem\_rd (memory read). So if the instruction requires mem\_rd as 1 then rd\_data (reading data) will be assigned based on the address of the wdata fed to this module. Also, whenever the mem\_wr has the rising edge, RAM will update its size with address as its index.



## **PART C: Explanation of the System**

Creating a processor requires putting together different components as we have learned in the course. Using abstraction we can break the whole system into smaller subsystems to help understand and read the code better while also giving up the ability to add more functions in future cases.

The following is a simplified overview of the CPU and how it behaves:

Firstly, the control module is the "brain" of our CPU because it tells our modules to what and when to process.



When the machine code is entered into the system it is in the ROM module, the control unit tells the **decoder** the process that instruction. Here the decoder extracts the registers, type of instruction and branch address from the machine code and send it to different parts of the CPU.



Some of these parts like the registers are sent to the register module. This module stores the registers so they can accessed later.



Jump address and branch address are sent the PC (Program counter) which basically acts as memory operator. It can move through the memory and provide us with memory address we need depending on the instruction.



The next step depends on the control, based on the instruction read it will update the ALU command to a specific command that was in the instruction and also send the data required by the ALU.



The ALU module is where the arithmetic actually happens, as mentions the ALU takes the its command and the data from the control module and the register module and it conducts the arithmetic and the results back to the control.



Back in the control module, we have many different that can be performed, like store the result into the register module, jump/branch to different memory address, perform an ALU command again or simply output the result.

The following is the full diagram which shows the connectivity of all the modules.



## **PART D: Implementation of RISC-V Instructions**

The instructions that will be implemented are of the R type instructions.

The format for the R-type instruction is:

7-bit funct | 5-bit-rs2 | 5-bit-rs1 | 3-bit-funct | 5-bit rd | 7-bit opcode

**NOTE:** The reg [8:0]op\_reg in the control unit consists of all the 9 control signals used in our project.

The first instruction that we will implement is:

We will first generate its machine code:

## $0000000\_00110\_00101\_100\_01001\_0110011$

So, when the instruction will be fed to inst decdoder, it will determine the type of instruction and the type of execution it is doing. Here, first it will see the first 7 least significant bytes (LSB) and check the type of instruction, we have **0110011** as first 7 LSB, and that's an R-type instruction. Next we figure out what are its function 3 bits (12th - 14th bit). We have 100 as our function 3 bits and also check the **0000000** function 7 bits (25th- 31st bit) those determine the type of execution we are gonna have, and that is **XOR** in this case. So, till now we know its an XOR execution and an R-type instruction. As we know XOR has a target register, and two source registers (read register 1 and read register 2) on which XOR performs its operation. So, after decoding the instruction, the control unit will move towards the control state. The control state gives ALU module its data and the command to perform calculation. The control state has cases based on execution, so we have XOR as execution, will locate XOR case. For doing XOR we do not need any immediate value, instead we need the second input as the source register, so it will select rd2 as its second input. It will also assign the Alu command and update the state to executing. In the execution state, it will trigger ALUenable control signal. Now the ALU module will start operating XOR operation when the ALUenable has the rising edge, and store the result in ALUresult (32 bits). Now we have done our execution, the state becomes writeback. As the result is in ALUresult, the writeback state will select the output of ALUresult as the input of the register writing by triggering the regwrite control signal. Now remember, the output of inst decoder is fed as input of the register module (rr1, rr2, wr). And also the state in the control will be updated to change pc because we have to process the remaining code. Again based on the type of execution, the change pc state will now write the result to the register file when it update the mem wr control signal to 1. So, this is how XOR instruction is implemented into the CPU.

The Second instruction that we will implement is:

Firstly, we will generate its 32 bit machine code which is as follows:

So, right now our pc address is at this instruction (assume this is the first instruction of the code). Also the state will initially be **fetch**. We also assume that the above instruction is being fetched from the inst\_ROM module based on the value of our PC and the state is updated to decoding. Now the decoding state will trigger inst\_decoder module to decode the above

instruction by changing the value of op\_reg to 110000000. The decoder module is designed perfectly to determine the instruction and its type of execution. Firstly, it has a case statement to read the first seven bits to check the opcode of the instruction, based on that, it will then check the 3-bit functional code of the instruction (12th - 14th bit). Since we have gotten the opcode and the 3-bit functional code of the instruction, it is easy to say whether the second input is of the immediate or register-type. In this case:

opcode: 0110011 -> R-type instruction.

funct3: 000 & funct7: 0000000 -> ADD type operation.

Therefore the instruction will be ADD, the bits from 15-19 will be rs1, the bits from 20-24 will be our rs2 and the bits from 7-11 will be the targeted register. After determining the type of instruction and the operation, the control state will determine what values it has to take in order to execute the ADD operation i.e, whether the immediate is used or register value is used. Here, ALUsrc will be 0, why? Because, the ADD operation does not require an immediate value. And state will be updated to execution. Now the execution state performs the ADD operation using the ALU module. ALUresult will store the result of the operation. Next state is WriteBack, which first reads the data content from ram and then acts as an input to the register module. So in our case of the ADD instruction, it will select the output of the ALUresult ram as the input of register writing. This is how this instruction is implemented throughout the project.

The third instruction that we will implement is:

The machine code for this instruction is as follows:

This is also an R-type instruction, so it is similar case as the previous two instructions.

• First the Control unit will **fetch** the code as op\_reg will trigger the load\_inst control signal. So, it will fetch instruction from ROM module and update its state to decoding.

- Secondly, after fetching the instruction, it will start decoding the instruction using inst\_decoder module. And do it in the same way as we saw in previous instructions implementations. So from the opcode and function 3 bit we get SUB execution and R-type instruction.
- After decoding the instruction, it executes based upon the execution type. The control state gives SUB command to the ALU module and it performs subtraction operation on the two registers. Also, the control state will be updated to executingstate.
- The executing state will then trigger ALUenable control signal as 1 and it will make the subtraction happen into the ALU module saving the result in ALUresult and changing the state to writeback.
- Based on the SUB execution, the output of the ALUresult will act as an input of register writing. And change its state to change pc.
- The change pc state actually writes back into the register file and then updates the pc triggered by the positive edge of inc pc (increment pc).

## **PART E: Added Instruction**

The instruction we added into our project was **AND** instruction which is an R-type instruction. The AND instruction requires two source registers and one target register, so there will be no use of memory in this case. Only the registerfile will be used to get and store the values into the particular registers. So, initially, when the PC will be updated to PC+4 and the next instruction is AND, the IF stage will fetch the instruction from the ROM (Read Only Memory) which is our inst\_ROM module. After fetching, in the second clock cycle that instruction will be decoded to identify the source registers and the destination register in the inst\_decoder module. It does not have any immediate value, so we won't need to generate the imm value. After ID stage, during the clock cycle 3, the EX stage will execute the AND operation into ALU module, using data1 and data2 as the values of the source registers and store the result in ALUresult. Now we got the result of the AND operation, but we need to write back into the destination register. But before it comes the MEM stage, which is not required in this case, so after clock cycle 4, and during the clock cycle 5, The WB stage will write the output of ALUresult as the writing data along with writing register (destination register) as an input to the

register module and will store the desired value in destination register. This was the design of out instruction into the project .

To add this instruction, we changed three of the modules to implement AND instruction. The ALU module, inst\_decoder module and the control module. The changes made in each of these modules was necessary because to process the AND operation within the risc-v instructions. We will see in detail the changes made in each module:

#### 1. ALU.v

We added an extra output named "expected" which is like a test-case to and generate the actual output of the AND operation so we can compare it with the output ALUresult. We also added a parameter "AND = 5'b00011" to indicate the command given to the logic unit. After it, in the always block, and into the case; AND part is added to carry out AND function between data1 and data2. So, this way we execute the AND operation during EX stage of the pipeline datapath.

## inst\_decoder.v

This module of the project, resembles usually the second stage of the pipeline datapath which is the ID stage. Few changes were made into this module too. Firstly, to all the 11-bit execution codes we added one more parameter "AND = 11'b0000000011" which feeds as an input to the control module. For example, if it is an R-type instruction, and based on its opcode and function 3 bits, it will determine its execution bits and send it to the control unit. So for that to happen, we also added one more case inside R-type as AND, which becomes active only when the seven most LSB are 0110011 and its function 3 bits are 111. That will give us rr1, rr2 and wr registers being used into AND operation.

## 3. control.v

Inside the control unit, we added similar parameters ALUAND = 5'b00011 and AND = 11'b00000000011 to identify ALUcommand and the execution respectively. And also inside the execution stage, we added one more case as AND to send out respective control signals and command to specific modules to carry out AND function. For the WB stage, the process will be same as other R-type instructions so no major changes in it were required.

Below is the summarized table of the modifications made into the modules:

| BEFORE                                                                                           | AFTER                                                                                         |
|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| 11 parameters and no AND execution code for 11- bit execution code into the inst_decoder module. | 12 parameters for execution code after adding AND operation.  parameter AND = 11'b0000000011; |
| 5 if conditions in R-type instruction for ADD, SUB, OR, XOR, SLL in inst_decoder module.         | 6 if conditions for R-type instruction after adding AND condition in inst_decoder module.     |

| 5 parameters for command in <i>ALU</i> module. (SUB, ADD, SL, OR, XOR)                    | After adding command for AND, 6 parameters.  Parameter AND = 5'b00011;                                                                                                            |
|-------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 5 cases in ALU module not including the AND case to carry operation in <i>ALU</i> module. | 6 cases after adding the AND case.  ALUresult_r <= data1 & data2;                                                                                                                 |
| No extra output to check whether<br>ALUresult is correct or wrong.                        | One added output as output[31:0] expected to determine the actual AND operation output and compare it to ALUresult.                                                               |
| 5 parameters for <i>ALUcommand</i> and 11 parameters of execution code in control unit.   | 6 parameters for ALUcommand and 11 parameters for execution code after adding ALUAND = 5'b00011 and AND=11'b0000000001                                                            |
| No change into the <i>control</i> stage of the control unit.                              | Added AND case to send appropriate control signals and ALUcommand to appropriate modules.                                                                                         |
| No changes for <i>executing</i> and <i>writeback</i> stage of the control unit.           | As AND is an R-type instruction, so no memory is used for writing back the result, so the process will remain same as other R-type instructions with same active control signals. |

## **PART F: Functionality Of Relevant Modules**

Functionality of relevant modules are shown below:

## • Inst\_decoder module functionality:

From time 0 to 100ns the value of execution is 0 because the rising edge of dec\_en starts from 100ns and therefore from that clock cycle time 100ns the always block in inst\_decoder module will execute.

a.



| BINARY                                | HEXADECIMAL |
|---------------------------------------|-------------|
| 0000000_00010_00001_000_00011_0110011 | 002081B3    |
| 0000_0001_0000                        | 010         |

b.



## • ALU module functionality:

From time 0 to 100ns the value of ALUresult is 0 because the rising edge starts from 100ns and therefore from that clock cycle time 100ns the always block in ALU module will execute.

a.



b.



#### The result is B because:

XOR of 11010 and 10001 is 01011 and converting 01011 to hexadecimal is B.

| Α  |
|----|
| 11 |
|    |
| )B |
| C  |



We are doing OR of ABCD with FFFF so the result is FFFF.

| BINARY                                  | HEXADECIMAL |
|-----------------------------------------|-------------|
| 0000_0000_0000_0000_1010_1011_1100_1101 | 0000ABCD    |
| 0000_0000_0000_0000_1111_1111_1111_1111 | 0000FFFF    |
| 10000                                   | 10          |
| 0000_0000_0000_0000_0000_0000_0000_1011 | 0000FFFF    |

## • inst\_ROM functionality:



Here, in the ROM module we have to input the instructions address and it gives the instruction based on the index of the address. When the load\_inst has a rising edge, then it will give us the instruction from the ROM file based on the address, or else if it has a negative edge it will give "zzzzzzzz" in hexadecimal. Here in the above illustration we inputted a random Hexadecimal instruction address so when load\_inst has a positive edge, it will give us the specific instruction in 32 bits and when it is a negative edge it gives "zzzzzzzzz".

| BINARY                                  | HEXADECIMAL |
|-----------------------------------------|-------------|
| 0000_0000_0000_0000_0010_1010_0011_1100 | 00002A3C    |

• added instruction (AND) functionality in ALU module:



Here, in the above waveform, suppose we store the value "0xF0F0F0F0" to register x3 and store "0xFF00FF00" to register x4, after the execution stage the ALU module will execute it and give us the result as "0xF000F000". We made a test case in order to determine if the operation is implemented correctly, And as we can see the ALUresult output matches the expected output so hence it is working correctly.

| BINARY                                  | HEXADECIMAL |
|-----------------------------------------|-------------|
| 1111_0000_1111_0000_1111_0000_1111_0000 | F0F0F0F0    |
| 1111_1111_0000_0000_1111_1111_0000_0000 | FF00FF00    |
| 1111_0000_0000_0000_1111_0000_0000_0000 | F000F000    |

• added instruction (AND) functionality in inst\_decoder module:



This is the instruction decoder module, and we passed the 32-bit code to it of instruction, "and x5, x3, x4". rr1 and rr2 are source register 1 and 2 respectively and wr is the target register. The 11 bit execution code is the command given to the ALU module to perform the specific operation which is 0000000011 assigned to AND. Hence, from the functionality module, the decoder is decoding the instruction correctly.

## **PART G: Project Breakdown**

| Priyank Patel | - RISC-V instructions implementation<br>- module explanations |
|---------------|---------------------------------------------------------------|
|               |                                                               |

| Kishan Patel    | - module diagrams<br>- functionality of relevant modules |
|-----------------|----------------------------------------------------------|
| Harpreet Goraya | - explanation of a system<br>- full design diagram       |
| Kairav Naik     | - RISC-V instructions implementation - conclusion        |

#### **Part H: Conclusion**

In milestone 1, through this project we have learned how the aforementioned modules are used to be able to process a RISC-V instruction into the CPU. The CPU itself is an interwoven web of different components/modules which as a team creates a system. With the control module acting as the brain, it decides what the modules will process. The different parts of an instruction is decoded and all the components (i.e registers, branch) are split off into different modules. Finally ending up in the ALU depending on the information that it needs which the control module can then use to perform an action. All in all this milestone introduced us to how a CPU is created.

In milestone 2, we used our understanding from milestone 1 and implemented our own instruction (ADDI) in the project. This module was more practical than the first which gave us an even better understanding how these different modules are connected and interact with each other. In addition, we also learned some basics of Quartus software, as well simulation waveforms which allowed to test our implementations. Overall this project through us the importance of low level languages and a different perspective on how programs are built.